Categorization of Unorganized Text Corpora for better Domain-Specific Language Modeling

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross Language Text Categorization by Acquiring Multilingual Domain Models from Comparable Corpora

In a multilingual scenario, the classical monolingual text categorization problem can be reformulated as a cross language TC task, in which we have to cope with two or more languages (e.g. English and Italian). In this setting, the system is trained using labeled examples in a source language (e.g. English), and it classifies documents in a different target language (e.g. Italian). In this pape...

متن کامل

Domain and Language Independent Feature Extraction for Statistical Text Categorization

A generic system for text categorization is presented which uses a representative text corpus to adapt the processing steps: feature extraction, dimension reduction, and classification. Feature extraction automatically learns features from the corpus by reducing actual word forms using statistical information of the corpus and general linguistic knowledge. The dimension of feature vector is the...

متن کامل

Domain Kernels for Text Categorization

In this paper we propose and evaluate a technique to perform semi-supervised learning for Text Categorization. In particular we defined a kernel function, namely the Domain Kernel, that allowed us to plug “external knowledge” into the supervised learning process. External knowledge is acquired from unlabeled data in a totally unsupervised way, and it is represented by means of Domain Models. We...

متن کامل

A Domain Specific Modeling Language for REA

The Resource-Event-Agent (REA) ontology has its roots in the accounting discipline and was originally developed as a reference framework to conceptualize economic phenomena in an enterprise. In its proposal in 1982, McCarthy already had the vision to facilitate the design of data structures of accounting information systems by means of REA [1]. Since this time the REA model has been further ext...

متن کامل

Toward Categorization of Sign Language Corpora

This paper addresses the notion of parallel, noisy parallel and comparable corpora in the sign language research field. As it is quite a new field, the categorization of sign language corpora is not well established, and does not rely on a straightforward basis. Nevertheless, several kinds of corpora are now available and could raise interesting issues, provided that adapted tools and technique...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Advances in Electrical and Electronic Engineering

سال: 2013

ISSN: 1804-3119,1336-1376

DOI: 10.15598/aeee.v11i5.897